PGS Catalog Calculator installation tutorial

Author

Benjamin Wingfield

Published

May 29, 2024

Who is this tutorial for?

An understanding of what polygenic scores (PGS) are, why they’re useful, and the PGS Catalog would be helpful before starting this tutorial.

The PGS Catalog Calculator (pgsc_calc) doesn’t have a graphical interface, so you’ll need to be able to open a terminal and run some commands to complete this tutorial.

What will I achieve?

By the end of this tutorial you’ll have:

  • Installed pgsc_calc on your computer
  • Calculated some test PGS

What resources do I need?

You’ll need a computer with:

  • A modern version of Linux 🐧 or macOS 🍏
  • A good amount of RAM (16GB or more preferred)
  • Permission to install software on your machine

Getting started

Install Docker

If you already have docker installed on your computer, you can skip this section.

Important

If you’re using Docker desktop on macOS, please make sure you’re running v4.30.0 or later or you might experience a “segmentation fault” error when running the calculator

If you need to install Docker, please click on “details” below to learn how.

We use docker to ship our software in containers to run on your computer. There are a lot of different tools for working with containers, but docker is the most popular.

The simplest way to install docker is by downloading Docker Desktop.

Have you tried turning it off and on again?

You might need to restart your computer after you’ve installed docker for the first time

You can check to see if docker desktop is running by opening a terminal and running:

$ docker run hello-world

You should see something that looks like:

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Install Nextflow

If you already have Nextflow installed on your computer, you can skip this section.

Tip

If you already have Nextflow installed, try running nextflow self-update to grab the latest version

If you need to install Nextflow, please click on “details” below to learn how.

pgsc_calc is built using Nextflow, so you’ll need Nextflow installed on your computer to run it. These installation steps are taken from the Nextflow documentation.

First, check if you have at least Java 11 installed on your computer:

$ java -version # java 21, looks good!
openjdk version "21.0.2" 2024-01-16

If you don’t have Java installed, check the Nextflow documentation for next steps.

Then, run:

$ curl -s https://get.nextflow.io | bash

This will create a file called nextflow in the current directory.

Make the file executable:

$ chmod +x nextflow

Then make your computer able to find the nextflow program:

$ sudo mv nextflow /usr/local/bin

Then check if nextflow is installed correctly:

$ nextflow info

You should see something like:

  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC (22:01 BST)
  System: Mac OS X 14.4.1
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 21.0.2
  Encoding: UTF-8 (UTF-8)
Tip

If you’d like to learn more about Nextflow, the nf-core community has a great explanation

Checklist

After you’ve completed this checklist, you’ll be ready to install pgsc_calc and calculate some polygenic scores 🧬

  • The Docker Desktop application is open and running
  • Running docker run hello-world in a terminal shows:
Hello from Docker!
This message shows that your installation appears to be working correctly.
...
  • Running nextflow run hello in a terminal shows:
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nextflow-io/hello` [sleepy_wilson] DSL2 - revision: 7588c46ffe [master]
...
Hola world!

Ciao world!

Bonjour world!

Hello world!

Calculate some polygenic scores

To install pgsc_calc and calculate some PGS, you can run the calculator using the test profile:

$ nextflow run pgscatalog/pgsc_calc -profile test,docker --outdir results

When you try this for the first time it can take a few minutes to finish, depending on the speed of your internet connection. The calculator code and containers are being downloaded and cached in the background.

Important

If your computer uses the ARM architecture (like modern M1 Macs) you need to change: -profile test,docker to -profile test,docker,arm in the example above

Tip

If you’ve used the calculator before, please include the parameter -latest to grab the most recent release

You should see output that looks like:

N E X T F L O W  ~  version 23.10.1
Pulling pgscatalog/pgsc_calc ...
 Already-up-to-date
Launching `https://github.com/pgscatalog/pgsc_calc` [chaotic_yalow] DSL2 - revision: 8bdf287d55 [main]

------------------------------------------------------
  pgscatalog/pgsc_calc v2.0.0-alpha.5-g8bdf287
------------------------------------------------------
Core Nextflow options
  revision                  : main
  runName                   : chaotic_yalow
  containerEngine           : docker
  launchDir                 : /Users/bwingfield/Documents/projects/pgsc_calc
  workDir                   : /Users/bwingfield/Documents/projects/pgsc_calc/work
  projectDir                : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc
  userName                  : bwingfield
  profile                   : test,docker
  configFiles               :

Input/output options
  input                     : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/examples/samplesheet.csv
  scorefile                 : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/examples/scorefiles/PGS001229_22.txt
  outdir                    : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/results

Reference options
  ref_samplesheet           : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/reference.csv
  ld_grch37                 : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg19-GRCh37.txt
  ld_grch38                 : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/high-LD-regions-hg38-GRCh38.txt
  ancestry_checksums        : /Users/bwingfield/.nextflow/assets/pgscatalog/pgsc_calc/assets/ancestry/checksums.txt

Compatibility options
  target_build              : GRCh37

Max job request options
  max_cpus                  : 2
  max_memory                : 6.GB
  max_time                  : 6.h

Other parameters
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use pgscatalog/pgsc_calc for your analysis please cite:

* The Polygenic Score Catalog
  https://doi.org/10.1038/s41588-021-00783-5

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/pgscatalog/pgsc_calc/blob/master/CITATIONS.md

[b5/2b0584] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (cineca chromosome 22)
[54/5b2ef5] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:INPUT_CHECK:COMBINE_SCOREFILES (1)
[22/8efca4] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (cineca chromosome 22)
[ee/f16912] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_COMBINE (cineca)
[b9/aa6036] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (cineca chromosome 22 effect type additive 0)
[4d/c20fb0] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:SCORE_AGGREGATE (cineca)
[f0/e5f070] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:REPORT:SCORE_REPORT (cineca)
[59/71755b] Submitted process > PGSCATALOG_PGSCCALC:PGSCCALC:DUMPSOFTWAREVERSIONS (1)
-[pgscatalog/pgsc_calc] Pipeline completed successfully-

If you can see the message Pipeline completed successfully, that means you’ve:

  • Installed the PGS Catalog Calculator
  • Calculated some polygenic scores

Well done! 🎉

Tip

If things don’t look quite right, please open a discussion or issue on GitHub describing your problem

Results

You can check the results folder in the same directory you ran the calculator from to look at the calculator outputs.

However, the test results are not biologically meaningful. They’re calculated from small synthetic data. The purpose of the test profile is to install the calculator and test all of the components are working correctly on your computer, ready for imputed human genomes.

Next steps

If you’d like apply the calculator to your genomic data, please check our documentation.

Important

The calculator currently supports imputed human genomes.

Using unimputed array or whole-genome sequencing data with the calculator will often result in errors. These errors aren’t a bug with the calculator, they’re caused by the format and structure of the input genomes.

There are some things you need to do to prepare your genomes for PGS calculation

The documentation also contains things like:

Tip

This tutorial describes using the calculator with docker because it’s the simplest approach on an ordinary computer

However, some computers can’t use docker - like HPC clusters or Trusted Research Environments (TRE) - so the calculator also supports Singularity/Apptainer and Anaconda

Please read the docs to find out more